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ABSTRACT 

Storage  of  hard-copy  archival  paper  documents  requires 

vast  amount  of  storage  space  and  time  to  search  and  retrieve. 
Technology  exists  today  to  convert  hard-copy  texts  using 
optical  scanners  and  storing  in  a  digital  formal  on  optical 
disks. 

This  thesis  conducts  an  indepth  current  technology 
research  of  optical  scanners,  optical  storage  mediums,  and 
optical  information  systems. 

Utilizing  the  thesis  documents  presently  stored  in  the 
library  aboard  Naval  Postgraduate  School,  as  a  statistical 
base,  this  research  analyzes  the  requirements  to  convert  the 
thesis  documents  to  digital  format. 

This  thesis  concludes  that  an  image  optical  information 
system  is  a  viable  alternate  to  storing  hard-copy  documents 
and  recommends  follow-on  thesis  research  to  build  an  in-house 
optical  information  system. 
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I.  INTRODUCTION 

A.   DISCUSSION 

The  reference  room  at  the  Knox  Library  aboard  Naval 

Postgraduate  School  is  reaching  its  maximum  storage  capacity 

in  terms  of  shelving  its  archive  documents.   Technology 

exists  today  to  convert  hard-copy  documents  using  optical 

scanners  and  storing  in  a  digital  format  on  optical  disks. 

Companies  are  finding  economic  as  well  as  technical 
virtues  to  optical-disk  technology  that  justify  going 
optical.   Some  firms  can  cost- justify  these  systems  by 
the  space  they  save  versus  all  other  data  storage  media. 
The  value  of  a  document  may  be  beyond  measure,  but  a 
square  foot  of  a  floor  space  occupied  by  a  filing 
cabinet  certainly  has  its  price.  (Alter,  1988,  p.  18) 

Converting  to  digital  format  will  require  less  storage 

space  and  provide  for  a  faster  search  and  retrieval 

capability.   An  optical-disk  based  information  system  made 

by  LaserData  of  Lowell,  Mass.  was  installed  at  the  Maine 

Medical  Center  in  Portland,  Maine  which  allowed  the  hospital 

to  clear  an  entire  floor  -  7,200  square  feet  -  of  a  building 

dedicated  to  medical  records  and  radiology  records.   The 

hospital  was  out  of  storage  space,  so  they  were  looking  to 

recapture  space,  rather  than  go  out  and  build  new  space. 


Figure  1    Today's  paper  filing  system.  ("Wang  Laboratories,  Imaging  -  Primer 
Series,"  1987,  p.  4) 

In  Portland,  building  new  storage  space  would  have  cost  $100 
per  square  foot.   Recouping  7,200  square  feet  at  $100, 
equals  a  savings  of  $720,000.   That  was  the  primary  cost- 
justification.   Another  was  being  able  to  instantly  find  the 
records,  which  improved  overall  patient  care.  (Alter,  1988, 
p.  18) 

Another  example  of  current  utilization,  USAA  (United 
Services  Automobile  Association)  set  up  a  1,300  workstation 
document-processing  system  to  be  shared  by  over  2,000 
employees  in  its  property  and  casualty  policy-service 
operation. 

According  to  Charles  A.  Plesums,  manager  of  image 
systems,  the  company  began  processing  2  percent  of  that 


operation's  document  workload  in  mid-July  1988  and  will 
expand  to  100  percent  by  early  1989. 

Seven  years  from  now,  he  added,  optical   disks  will 
store  300  million  pages  and  save  39,000  square  feet  of 
space.  (Lesher,  1988,  p.  33) 

This  thesis  will  consider  the  analysis  and  design  of  an 

Optical  scanner  to  Optical  storage  medium  Document 

Processing  System. 

B .  SCOPE 

This  thesis  includes  an  indepth  review  of  current 
Optical  Scanner  and  Optical  Storage  Medium  technologies 
presently  available.   The  purpose  of  this  review  to  provide 
the  reader  the  different  options  available  for  designing  an 
Optical  Information  System. 

Analysis  of  requirements  for  implementing  an  archive 
Optical-disk-based  document  processing  system  will  be 
conducted.   Alternative  systems  and  solutions  will  be 
addressed  and  a  recommendation  will  be  submitted  for 
possible  implementation. 

C .  METHODOLOGY 

Utilizing  the  thesis  texts  presently  stored  in  the 
library  as  a  statistical  population,  a  small  sample  will  be 


used  to  conduct  research  on  Optical  Scanners.   Questions  to 
be  answered  concerning  Optical  Scanners  will  be  1)  What  is 
the  time  required  to  convert  text  to  data,  2)  How  accurate 
is  converting  hard  copy  text  to  digital  format,  and  3)  What 
are  the  digital  storage  requirements.   Review  of  Optical 
Storage  Media  that  will  best  match  the  requirements  of  an 
Optical  Scanner  will  be  addressed.   Presently  there  is 
thesis  research  being  conducted  in  the  area  of  Indexing  an 
Optical  Disk  using  Hypertext,  and  Storage  requirements  using 
Optical  Disk.  This  thesis  will  primarily  emphasize  Optical 
Scanners  and  the  initial  phase  of  conversion  in  an  Optical 
information  system. 


II.  OPTICAL  SCANNERS 

A.  INTRODUCTION 

The  initial  phase  of  converting  hard-copy  text  to 
digitized  format  in  an  Optical  Information  System  is 
scanning  the  document  either  by  an  Optical  Character  Reader 
or  by  an  Image  scanner. 

Today  modern  scanners  can  combine  the  functions  of 
reading  text  and  processing  image  information  because  they 
contain  more,  and  more  complex  components  and  algorithms 
than  did  earlier  scanners.   No  scanner  yet  exists  that  can 
scan  a  page  and  interpret  text  and  graphics  in  a  single 
pass.   However,  software  now  exists  that  provides  the  option 
of  utilizing  an  image  scanner  to  scan  for  either  text  or 
graphics  in  a  single  pass  and  then   combining  the  two  to 
produce  a  digitized  copy  of  the  original. 

B .  ELEMENTARY  CONCEPT  OF  SCANNERS 

A  scanner  obtains  optical  information  (about  light  and 
dark  areas  on  the  image)  from  the  original  image.   Next,  the 
electronic  converter  units  translate  that  optical  image 
information  into  digital  information.    A  processor  unit 


manipulates  the  digital  data  according  to  specified 
instructions,  in  order  to  create  an  output  image  that  can 
be,  in  some  way,  different  from  the  original  image.   Figure 
2  illustrates  the  conversion  from  a  scanned  image  to  digital 
format  for  comparison.  The  reader  unit  of  a  scanner  combines 
a  light  source,  several  mirrors,  and  a  lens.   These 
components  illuminate  the  original  image  and  reflect  light 
from  it.   More  light  is  reflected  from  lighter  areas  of  the 
original  than  from  darker  areas . 

A  photoconverter  converts  information  about  the 
reflected  light  into  an  electrical  voltage.   An  analog-to- 
digital  converter  further  changes  the  electrical  (analog 
information  into  a  digital  (binary)  data  format. 

The  digital  data  is  passed  to  an  image  processor,  where 
it  can  be  manipulated  to  produce  the  desired  output.   In  the 
image  processor,  adjustments  may  be  made  to  the  size  and 
shape,  resolution,  and  contrast  of  the  output  image. 
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Figure   2       Text    and    Graphics    Scanning    Techniques     (PC 
Magazine,    1986,    p.    134) 


C.   INDEPTH  REVIEW  OF  HOW  SCANNERS  WORK 
1.  The  Light  Path 

The  original  image  is  first  illuminated  by  a  light 
source  in  the  reader  unit  of  a  modern  scanner.   Light 
reflected  from  the  original  image  is  passed  by  mirrors  to  a 
lens,  which  focuses  the  reflected  light  and  passes  it  from 
the  reader  unit  to  the  converter  units. 

Figure  3  illustrates  the  path  that  the  light  takes 
from  the  original  image  to  the  CCD  (Charged  Coupled  Device) , 
or  photoconverter . 

The  above  listed  steps  of  the  scanning  process  are 
essentially  the  same  for  both  text  and  image  processing. 
However,  depending  on  the  scanner's  design,  either  the  page 
is  moved  over  a  fixed  scanning  element  or  the  scanning 
element  is  moved  over  a  fixed  page.   Most  OCR's  have  a  fixed 
scanning  element  and  most  image  scanners  have  a  moving 
scanning  element. 

The  light  source  of  the  scanning  element  may  be  a 
laser,  or  another  type  of  high  intensity  lamp.   For  an  image 
scanner,  the  light  source  is  mounted  on  a  carriage,  so  that 
it  moves  to  illuminate  the  original  image,  not  all  at  once. 


but  in  a  systematic  manner.   The  carriage  moves  in  the  slow 
scan  direction,  illuminating  a  strip  of  the  original  image 
with  each  movement.   While  a  strip  is  thus  illuminated,  the 
fast  scan  occurs. 

a.  Slow  Scan 

In  the  slow  scan  direction  the  light  source  moves 
stepwise  to  each  strip  of  the  original  image,  where  it 
pauses  while  the  fast  scan  takes  place.   The  distance  it 
moves  is  dependent  on  the  resolution  setting.   This  distance 
corresponds  to  the  height  of  a  pixel  in  the  output  image. 
jb.  Fast  Scan 

In  the  fast  scan  direction  the  light  source 
pauses  for  a  brief  interval.   Information  from  the 
illuminated  strip  of  the  original  image  is  read  and 
converted  into  digital  data  before  it  is  processed.   The 
illuminated  strip  is  divided  into  discrete  sections.   The 
width  of  each  section  is  determined  by  the  resolution.   This 
width  corresponds  to  the  width  of  a  pixel  in  the  output 
image. 

Light  reflected  from  the  original  image  is  thus 
divided  into  discrete  areas  which  are  processed  separately. 
Each  is  represented  as  a  pixel  in  the  output  image. 


Mirror  2 


•Mirror  3 


Figure  3     Scanned  light  path  to  the  CCD.     (XEROX  7650  reference  manual, 
1987) 


A  series  of  mirrors  pass  the  reflected  light  from 
the  original  image  to  the  lens .   In  this  way  the  focal 
length  of  the  reflected  image  is  effectively  made  longer. 
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The  longer  focal  length  permits  the  use  of  a  relatively 
small  lens. 

The  lens,  like  the  lens  in  a  camera,  focuses  the 
reflected  light  from  the  final  mirror  onto  a  specific  site 
(corresponding  to  a  pixel)  on  the  surface  of  the 
photoconverter . 

2 .  Light  Signal  to  Digital  Signal 

The  converter  units  of  a  scanner  contain  electronic 
devices  which  convert,  or  transform,  the  information 
reflected  from  the  original  image  into  electronic  data. 

As  the  light  reflected  from  the  original  image  is 
passed  to  the  photocells  on  the  surface  of  the  CCD,  they 
convert  that  optical  signal  into  an  electrical  signal  (a 
Voltage)  proportional  to  the  "size"  of  the  optical  signal. 
The  "size"  of  the  optical  signal  is  the  amount  of  reflected 
light.   That  is,  a  white  area  of  the  original  image  reflects 
more  light,  so  it  generates  a  greater  voltage. 

The  electrical  signal  requires  one  more 
transformation  before  its  information  can  be  understood  by 
the  processor.   An  analog-to-digital  converter  performs  this 
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final  transformation,  and  passes  the  digital  signal  to  the 
processor. 

From  this  point  on,  the  data  manipulation  processes 
for  optical  character  readers  and  image  scanners  are  very 
different . 

D.   OPTICAL  CHARACTER  READERS 
1.  What  is  OCR? 

An  OCR  is  defined  either  as  optical  character 
readers  or  as  optical  character  recognition  used  in  the 
process  of  converting  an  image  of  text  into  computer 
readable  form  (i.e.,  ASCII). 

The  original  concept  of  an  OCR  was  a  device  that 
could  only  digitize  characters  produced  by  a  typewriter.   A 
new  acronym,  ICR,  is  being  used  by  some  vendors  to  replace 
OCR.   ICR,  defined  either  as  Internal  Character  Recognition 
or  Intelligent  Character  Recognition,  includes  the 
capability  to  recognize  omni-font  characters  or  otherwise 
known  as  the  many  different  fonts  and  characters  produced  by 
todays  computers  and  printers.   For  the  purpose  of  this 
thesis,  OCR  will  be  used  to  imply  both  OCR  and  ICR. 
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OCR  is  accomplished  by  analyzing  the  image  of  a 
character  and  then  deciding  what  character  the  image 
represents.   Unfortunately,  OCR  is  not  an  exact  science  and 
consequently,  any  OCR  process  is  inherently  imperfect. 
Recognition  errors  will  occur,  regardless  of  the  particular 
OCR  technology. 

2 .  OCR  Technology 

Once  the  scanned  image  is  converted  into  digital 
data,  OCR  scanners  digitize  the  characters  a  line  at  a  time 
and  then  isolate  them,  character  by  character,  into  frames 
ranging  from  24  by  40  pixels  up  to  30  by  50  pixels.   The 
individual  frames  are  stored  in  RAM  for  character 
recognition  processing. 

There  are  two  broad  categories  of  character 
recognition  processing  commonly  used  in  today's  OCR/ICR 
scanners.   The  first,  and  perhaps  the  oldest,  is  commonly 
called  Matrix  Matching  or  Template  Matching.   The  second,  a 
more  recent  development,  is  referred  to  as  Feature 
Extraction. 
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a.  Matrix  Matching 

Matrix  Matching,  in  its  simplest  form,  can  be 
thought  of  as  comparing  the  image  of  an  unknown  character 
with  images  of  known  characters  and  finding  the  nearest 
match.   Figure  4  illustrates  how  a  scanned  image  is  compared 
to  a  template. 


Uaer  "A"  atttu  promt  on  paprr. 

m 

ilii-i 
iiiiiiii. 
t:::::::::. 


Timpiaiejar  Utt  later  "A". 


.ir    *iii:::ii 


in. 

u::::n. 


Saayiiua  rmope  of  Uit  autraaer 

ID  Of  fKOQIUJWtL 


Comvarurm  ol  tDr  xwnti  imtagr 
ana  inf  "B'  mplatt 
—rnaroofT  ntt  tannfifi. 


Camvanxm  of  Utt  taaaud  vmagt 
aaa  Ute  "A"  lemptam 
^"tftornotr  ffco^nuui  at  "A  . 


Figure  4    Template  Matching.  (MICRO  User's  Guide,  1988,  p.  20) 

The  very  nature  of  this  technique  requires  a 
complete  set  of  templates  for  each  font  the  system  will 
read.   This  means  that  multi-font  matrix  matching  systems 
need  considerable  memory  for  the  font  libraries. 
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Another  disadvantage  associated  with  matrix 
matching  is  its  sensitivity  to  minor  variations  in  fonts. 
Two  fonts  that  look  the  same  to  a  user  may  not  be  recognized 
equally  well  in  matrix  matching  due  to  subtle  differences  in 
character  shapes  or  sizes.   On  the  positive  side,  matrix 
matching  is  relatively  insensitive  to  broken  characters, 
which  occur  all  too  frequently  in  ordinary  documents. 
b.     Feature   Extraction 

The  term  Feature  Extraction  is  used  in  the 
industry  to  describe  any  OCR  technique  other  than  Matrix 
Matching.   As  a  result,  the  name  does  not  convey  much 
information  about  how  OCR  is  being  done.   Of  the  feature 
extraction  techniques  in  use  today,  the  most  popular  is 
Topological  Feature  Analysis. 

Topological  Feature  Analysis  involves  identifying 
the  important  features  of  a  character  image  and,  based  on 
these  features,  deciding  what  character  the  image 
represents.   These  features  can  include  vertical  strokes, 
horizontal  strokes,  line  endings,  closed  curves,  open 
curves,  slanted  strokes,  intersection  of  strokes,  et  cetera. 
Figure  5  illustrates  the  comparison  between  a  scanned  image 
and  the  primitive  features  extracted. 
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Figure  5     Feature  Matching.     (MICRO  User's  Guide,  1988,  p. 

23) 


The  nature  of  this  technique  makes  it  relatively 
insensitive  to  slight  variations  in  character  shape  and 
size.   Another  advantage  to  feature  extraction  techniques  is 
that,  in  most  cases,  less  memory  is  required  for  the  font 
libraries.   For  example,  the  features  of  the  letter  "e"  are 
fairly  constant  for  a  wide  variety  of  typefaces.   A 
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disadvantage  of  feature  extraction  is  its  sensitivity  to 
broken  characters . 

E.   IMAGE  SCANNERS 

Image  scanners  work  like  laser  printers  in  reverse.   A 
scanner  converts  image  information  into  electrical  signals 
that  can  be  stored  in  a  computer,  whereas  a  laser  printer 
converts  the  image  into  charges  on  the  surface  of  a 
photosensitive  drum. 

The  electrical  signals  from  the  scanner's  CCD  (which 
reads  an  entire  row  at  a  time)  are  converted  to  numeric 
values  and  stored  in  RAM  until  the  entire  image  is  scanned. 
Figure  6  illustrates  the  digital  data  captured  from  a  single 
strip  of  the  letter  "T"  (one  pixel  in  height)  as  it  is  read 
by  the  scanner. 

This  data  is  stored  in  RAM  like  a  two-dimensional  mosaic 
of  dots  that  represents  the  original  image.   This  two- 
dimensional  mosaic  is  otherwise  known  as  a  bit  map. 

Two  parameters  are  very  important  in  image  scanning: 
resolution,  usually  expressed  in  pixels  per  inch  (PPI),  and 
the  number  of  levels  of  grayscale  information  captured  for 
each  pixel . 
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Figure  6     Conversion  of  original  image  to  digital  information.     (XEROX  7650 
Reference  Manual,  1987) 
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1 .  Resolution 

Resolution  is  defined  as  the  number  of  pixels  read 
or  displayed  per  inch  (PPI),  both  horizontally  and 
vertically . 

Most  image  scanners  have  resolutions  up  to  300  PPI, 
meaning  that  a  single  scanned  line  across  the  width  of  an  8- 

by  10-inch  image  contains  2,400  pixels.   And  if  the 
lengthwise  resolution  is  also  300  PPI,  then  there  will  be 
3000  pixels  in  a  single  column  the  length  of  the  image.   In 
all,  it  takes  about  7.2  million  pixels  to  represent  an  8-  by 
10-inch  image.   An  image  that  size  requires  approximately  1 
megabyte  of  storage. 

Increasing  the  resolution  allows  more  detail  (finer 
lines  or  sharper  changes  in  gray  in  an  image)  to  be  resolved 
and  improve  the  appearance  of  a  scanned  image.   Figure  7 
illustrates  the  difference  in  appearance  of  the  letter  "a" 
at  different  resolutions. 

2 .  Levels  of  Grayscale 

If  you  think  of  a  typical  fine  grain  photograph,  the 
number  of  shades  of  gray  that  can  be  reproduced  are 
essentially  infinite,  at  least  as  far  as  the  human  eye  can 
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see.   However,  in  the  digital  world  there  is  a  limit  to  the 
number  of  shades  of  gray.   Everything  must  be  represented  in 
discrete  steps  and  it  takes  more  information  (bits  of  data) 
to  represent  more  steps.   Thus  4  bits  of  data  are  required 
to  represent  16  levels  of  gray  per  pixel,  6  bits  to 
represent  64  levels,  and  8  bits  to  represent  256  levels. 


Produced  at  300  pQi. 


Reproduced  at  75  ppi. 

!    I    1    M    I    I    M    I 


Reproduced  at  150  ppi. 


Figure  7    Comparison  of  different  PPI  settings.  (XEROX  7650  Reference  manual, 
1987) 

The  higher  the  resolution  of  an  image  and  the  higher 

the  number  of  levels  of  grayscale  the  image  contains,  the 

higher  the  quality  of  the  image.   Unfortunately,  the  size  of 

image  files  increases  with  respect  to  the  square  of  the 

resolution  and  linearly  with  respect  to  the  number  of  bits 

of  grayscale  information.   One  scanned  image  page  could 

easily  require  several  Mbytes  of  storage  as  illustrated  by 
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Table  I.   Additionally,  processing  and  retrieval  time 
increases  as  the  size  of  the  stored  data  increases. 

F.   COMPRESSION/DECOMPRESSION  OF  A  SCANNED  IMAGE 

Because  of  the  extremely  large  memory  requirements  of 

scanned  images,  it  was  necessary  to  develop 

compression/decompression  techniques  to  increase  the  number 

of  pages  that  could  be  stored  on  a  particular  device. 

Table  I  FILE  SIZE  (IN  MEGABYTES)  FOR  AN  8  1/2  X  1 1  INCH  SCANNED 
IMAGE.  (DESKTOP  PUBLISHING,  FALL  1988,  P.  29) 


Resolution 
of  scanned  image 
(PPi) 

2 

Grayscale 
16 

Levels 
64 

256 

300 

1.1 

4.2 

6.3 

8.4 

400 

1.9 

7.5 

11.2 

15.0 

600 

4.2 

16.8 

25.3 

33.6 

Until  compression/decompression  chips  became  available, 
image  compression  was  performed  in  software,  taking  at  least 
30  seconds  for  a  typical  page.   Now  with  image 
compression/decompression  processor  chips,  such  operations 
take  only  a  few  seconds. 
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Most  image  compression  processors  are  based  on  the  CCITT 
group  3  and  4  standards,  developed  for  use  in  facsimile 
transmission.   These  standards  are  based  on  a  combination  of 
two  image  compression  algorithms,  known  as  Modified  Huffman 
(MH)  and  Modified  Read  (MR)  encoding.  (Matlin,  1988,  p.  75) 
1 .  Modified  Huffman  Encoding 

Also  known  as  one-dimensional  encoding,  MH  works  on 
an  image  one  horizontal  line  at  a  time.   Each  run  length,  or 
continuous  string,  of  black  or  white  pixels  is  given  a  code 
base  on  the  probability  of  that  particular  length.   To 
achieve  image  compression,  the  codes  for  the  most  probable 
run  lengths  must  be  shorter  than  the  run  lengths  themselves. 

The  CCITT  Group  3  standard  employs  MH  encoding.   In 
this  algorithm,  the  codes  representing  the  document  run 
lengths  are  selected  from  one  of  two  64-element  tables 
representing  black  and  white  run  lengths  of  0  to  63  pixels. 
These  tables  were  derived  from  statistics  based  on  eight 
standard  documents,  which  are  available  from  the  CCITT.   For 
longer  run  lengths,  make-up  code  tables  exist  for  run 
lengths  in  multiples  of  64  pixels. 
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As  an  example,  if  an  entire  run  length  of  8  1/2 
inches  is  white,  and  the  document  is  scanned  at  300  PPI,  a 
white  run  of  2544  pixels  is  indicated  (rounded  off  to  the 
next-lowest  byte) .    This  run  length  can  be  represented  by  a 
make-up  code  for  2496  pixels  (000000011110),  a  white  run- 
length  code  for  48  pixels  (00001011)  and  an  end-of-line  code 
(000000000001)  .   Since  only  32  bits  are  required  to     -.:? 
represent  the  original  2544  pixels,  a  compression  ratio  of 
79.5:1  is  achieved  for  this  line.   Of  course,  this  is  a 
particularly  easy  line  to  compress. 
2  .  Modified  Read  Encoding 

Also  known  as  two-dimensional  encoding,  MR  coding 
takes  advantage  of  the  vertical  correlation  between  adjacent 
lines  within  a  document.   It  has  been  estimated  that  50 
percent  of  all  transitions  from  white  to  black,  or  vice 
versa,  occur  directly  below  a  transition  on  the  previous 
line.   To  encode  using  the  MR  algorithm,  the  relationship 
between  a  transition  on  the  current  line  and  the  previous 
line  is  determined.   If  the  current  line  transition  is 
within  three  pixels  of  a  transition  on  the  previous  line,  a 
vertical  mode  is  indicated.   This  case  is  represented  by  a 
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short  code  indicating  vertical  mode,  and  another  code 
indicating  the  relative  distance  between  the  current  line 
transition  and  the  transition  above  it.   If  the  distance 
between  transitions  is  more  than  three  pixels,  the  pixel 
distance  is  encoded  using  the  appropriate  MH  code.   This  is 
known  as  horizontal  mode.   A  third  technique,  known  as  pass 
mode,  is  used  to  realign  the  transition  pointers  between  the 
coding  and  reference  lines. 
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III.  OPTICAL  STORAGE  MEDIA 

A .   INTRODUCT ION 

Before  optical  storage,  it  was  difficult  to  have  video, 
audio,  image,  and  text  data  on-line  because  of  the  large 
memory  required  to  store  the  various  types  of  data.   With 
the  advent  of  optical  storage  media,  different  forms  of 
information  can  be  digitized,  integrated,  and  displayed  as  a 
single  form  of  information. 

The  extremely  high-density  recording  capability  of 
optical  devices  enables  one  5  1/4  inch  optical  disk  drive  to 
store  654  million  bytes  (654  megabytes)  of  information. 
That's  equivalent  to  the  amount  of  data  contained  on  1800 
360K  floppy  disks  or  33  (20  megabyte)  hard  disks  or  260,000 
pages  of  text.   A  single  12-inch  optical  platter  can  store 
as  much  as  four  GB' s  (gigabytes)  of  information.   Four  GB' s 
or  four  billion  bytes  is  equal  to  the  data  stored  in  160 
file  cabinets  or  the  amount  of  data  stored  on  120  2,400-foot 
magnetic  tapes  (Dukeman,  1988,  p. 82).    Larger  discs  are 
available  containing  even  larger  amounts  of  data,  for 
example  Eastman  Kodak  Company  recently  introduced  an  optical 
system  that  can  store  more  than  a  terabyte  of  information. 
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One  terabyte  is  equal  to  a  trillion  bytes.   The  Kodak  system 
6800  uses  14  inch  optical  discs  that  store  6.8  gigabytes 
(billion  bytes)  each  of  randomly  accessible  information. 
The  automated  library  unit  can  accommodate  as  many  as  150 
discs  (and  150  times  6.8  billion  yields  a  figure  in  excess 
of  one  trillion).   ("CD-ROMS:   The  Laser's  Edge  in  Data 
Storage",  1987,  p.  53) 

Compared  with  magnetic  disks  and  tapes,  optical  media  is 
almost  indestructible.   Optical  disks  can  be  mailed  without 
special  precautions,  and  taken  through  X-ray  machines  and 
airport  scanning  devices.   Optically  stored  data  is 
unaffected  by  the  environment  or  magnetic  fields.   Some 
optical  media  last  for  30  to  100  years,  but  magnetic  media 
has  an  average  life  expectancy  of  only  three  to  five  years. 

Optical  disks  are  removable  and  thus  the  data  can  be 
securely  stored.   Optical  disks  don't  stretch  over  time  as 
do  magnetic  tapes.   Most  optical  media  can't  be  altered,  and 
optical  media  is  less  expensive  per  megabyte  of  storage. 
Data  access  time  for  optical  disks  is  still  slower  than 
magnetic  disks,  but  as  the  product  matures  and  proliferates 
in  its  target  markets  of  data  distribution,  publishing. 
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database  archiving,  and  imaging  data,  access  time  will 
improve.   Paper-intensive  environments  should  trade 
increased  access  times  for  large  capacity,  unattended  backup 
capability  and  high-volume  storage  of  integrated  data,  text 
and  images.  (Levine,  R.,  1988,  p.  50) 

B.   ELEMENTARY  CONCEPT  OF  OPTICAL  DISKS 

Information  is  recorded  on  a  plastic-coated  glass  disc 
in  the  form  of  pits  and  lands.   Pits  are  indented  0.12 
micrometer  deep  and  0 . 6  micrometer  wide  into  the  surface  of 
the  disc.   Flats  measure  between  0.9  and  3.3  micrometers  in 
length . 

Data  on  the  CD-ROM  disk  is  arranged  in  a  spiral  pattern, 
radiating  from  the  center  toward  the  outer  edge.   A  space  of 
1.6  micrometers  separates  the  lines  of  data  in  the  spiral. 
This  configuration  yields  an  effective  track  density  of 
16,000  tracks  per  inch.   In  contrast,  floppy  disks  have  a 
density  of  96  tracks  per  inch. 

Before  data  is  inscribed  on  the  disc,  it  must  first  be 
translated  into  a  special  dialect  of  the  binary  channel  code 
that  is  used  to  transfer  data  between  more  familiar  magnetic 
formats  and  computer  devices.   In  magnetic  tape  formats,  the 
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ones  and  zeros  represent  digital  information.   CD-ROM 
channel  code  assigns  ones  to  mark  movement  from  a  land  to  a 
pit  or  a  pit  to  a  land,  zeros  mark  a  continuation  of  lands 
or  pits  in  series. 

A  low  power  laser  is  used  to  read  data  from  the  disc 
surface.   Light  rays  are  aimed  by  an  optical  head  over  the 
information  track  on  the  spinning  disc,  and  the  amount  of 
light  reflected  back  to  the  optical  head  indicates  the 
presence  of  a  flat,  which  reflects  more  light,  or  a  pit, 
which  reflects  less  light.   The  series  of  flats  and  pits  of 
digital  data  unscrambles  the  code  into  data  the  computer  can 
use.   ("CD-ROMs;  The  Laser's  Edge  in  Data  Storage",  1987,  p. 
52) 

C.   CD-ROM:   COMPACT  DISK -READ  ONLY  MEMORY 

CD-ROM  offers  prerecorded  optical  storage.   It's  a  read- 
only device;  you  can  read  the  information  on  the  disk,  but 
it  can't  be  altered.   Used  to  distribute  a  common  database 
that  doesn't  have  to  be  updated  constantly  to  multiple 
division,  departments  or  branch  offices,  it  ensures  that  the 
data  is  protected  against  tampering  or  accidental  erasing 
and  is  ideal  for  archival  purposes. 
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Most  systems  of  this  type  store  600  MB' s  (megabytes)  of 
data  on  a  4.7-inch  CD-ROM  disk  and  drive.   Half-height  5 
1/4-inch  drives  also  are  available.   CD-ROM  manufacturers 
have  embraced  the  High  Sierra  Group  and  ISO  9600  standard 
for  file  organization,  thus  the  4.7-inch  drive  has  become 
the  industry  standard.  (Levine,  R.,  1987,  p.  50) 

CD-ROM  is  the  most  economical  optical  media  for  mass 
distribution  of  databases.   The  cost  of  preparing  a  master 
disk,  is  relatively  expensive,  but  after  mass  production,  the 
cost  per  copy  can  be  as  low  as  $2  in  a  large-scale 
distribution . 

For  CD-ROM,  optical  disks  are  mass  produced  regardless 
of  whether  the  encoded  data  represent  video,  audio  or  text. 
Once  the  information  has  been  transcribed  into  digital 
format  and  the  special  cue  codes  have  been  added,  all  the 
data  is  transferred  to  a  master  tape. 

Once  the  information  is  recorded  on  the  plastic-coated 
glass  disc,  the  glass  disc  is  used  to  create  a  metalized 
master  disc.   The  surface  of  the  master  is  transferred  onto 
nickel  shells  to  form  negatives  and  positives  from  which 
'stamper'  copies  are  made  for  mass  replication.   The  stamper 
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is  used  to  transfer  the  information  onto  nickel  shells  with 
reflective  aluminum  and  then  covered  with  lacquer. 

D.   WORM:  WRITE-ONCE,  READ-MANY 

These  optical  storage  devices  permit  one-time  writing 
but  unlimited  reading  of  data  and  images.   Although  you 
can't  overwrite  or  erase  previously  stored  data,  you  can 
update  it  by  writing  new  information  into  a  file  at  another 
location  on  the  disk.   The  new  file  then  is  linked  to  the 
original  file  through  software  and  is  retrieved  in  its 
place.   This  operation  is  transparent  to  the  user. 

WORM  optical  technology  consists  of  a  high-intensity 
laser  beam  that  heats  and  permanently  changes  the  surface  of 
the  disk  as  it  writes  and  stores  information  on  the  disk. 
The  writing  process,  which  varies  from  vendor  to  vendor, 
ultimately  results  in  a  change  in  reflectivity  of  the 
information  layer  of  the  disk.   Figure  8  illustrates  how  a 
WORM  drive  works . 

Most  WORM  drives  use  a  glass-  or  plastic-based  substrate 
to  enclose  a  sensitive  recording  layer.   Eastman  Kodak  Co. 
uses  an  aluminum  substrate  in  its  14-inch  Optical  Disk 
System  6800. 
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How  a  WORM  Works 
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Figure  8    How  a  WORM  drive  works.    (PC  Magazine,  June  1987) 

Presently,  three  recording  technologies  are  used  for 
WORM  optical  disks;  ablative,  vesicular  and  phase-change. 

1 .  Ablative  Recording 

Ablative  technology  stands  out  as  the  most  common 
method  of  writing  data.   This  technology,  also  known  as  pit- 
forming,  burns  a  hole  in  the  active  layer  of  the  media. 

2 .  Vesicular  Recording 

Vesicular  technology,  also  known  as  bubble-forming, 
heats  the  media  until  it  melts  and  forms  a  bubble,  or 
explosion,  of  the  polymers  on  the  active  layer  of  the  media 
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3  .  Phase -Chsuige  Recording 

Phase-change  technology,  actually  produces  a  change 
in  the  media  from  a  crystalline  to  an  amorphous  state. 

E.   MkGNETO-OPTICAL  DISKS:   ERASABLE  OPTICAL  STORAGE 

Magneto-Optical  disks  provide  the  same  capability  of 
storing  and  retrieving  data  as  present  magnetic  drives  do, 
but  with  the  storage  capability  12  to  50  times  the  amount  of 
data  currently  packed  on  magnetic  hard  disk  drives. 

Magneto-Optical  disks  drives  use  a  combination  of 
technologies  to  store  and  retrieve  information.   They  rely 
upon  materials  whose  particles  can  be  magnetically  oriented 
either  up  or  down  but  whose  orientation  can't  be  changed 
easily  at  normal  temperatures. 

Storing  information  on  the  disk  is  performed  by  a  strong 
laser  beam,  as  illustrated  in  Figure  9,  which  heats  a 
microscopic  spot  in  a  multi-layered  material  sealed  in  the 
rotating  disk.   When  the  temperature  of  the  magneto-optical 
layer  reaches  a  certain  point,  its  magnetic  orientation  can 
be  changed  easily  by  a  magnetic  field  in  the  drive. 

After  the  laser  beam  is  removed,  the  exposed  disk  region 
retains  its  magnetized  orientation. 
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How  magneto-optical  disks  work 
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Figure  9    Magneto-Optical  disk  concqjt.    (Infoworld,  1988) 
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To  record  information,  a  first  pass  is  made  with  the 
laser  and  magnetic  field  to  erase  an  entire  section  of  the 
recording  surface  by  orienting  all  the  spots  the  same  way, 
to  represent  zeros.   Then,  a  second  pass  is  made  with  the 
magnetic  field  reversed,  but  this  time  the  laser  heats  only 
spots  to  be  changed  from  zero  to  one. 

To  read  the  information  later,  as  illustrated  in  Figure 
8,  a  weaker,  polarized  laser  beam  is  shone  at  the  spot. 
Depending  on  the  direction  of  the  magnetization  of  the 
recording  layer,  the  polarization  of  the  beam  rotates  180 
degrees,  a  phenomenon  known  as  the  Kerr  effect. 

After  striking  the  surface,  the  polarized  beam  is 
reflected  back  to  a  photodetector,  which  reads  the 
variations.   With  the  stronger  beam,  the  information  can 
later  be  "erased"  by  again  heating  the  spot  and  altering  the 
magnetic  orientation. 

F.   SUMMARY 

When  designing  an  archival  information  system,  the 
optical  media  of  choice  is  WORM.   Cost  of  producing  the 
master  disk  would  be  prohibitive  for  CD-ROM  unless  there  was 
a  distribution  base  to  make  it  cost  effective.   Magneto- 
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optical  is  still  being  development,  but  does  provide  an 
alternative  to  WORM  if  there  is  a  need  to  erase  the  original 
disk.   However,  magneto-optical  disks  are  more  expensive 
than  WORM  disks,  leaving  WORM  as  the  economic  medium  of 
choice  for  archival  purposes. 
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IV.  OPTICAL  INFORMATION  PROCESSING 

Optical  information  processing  is  a  relatively  new 
technology,  utilizing  optical  storage  media  to  store  the 
data  created  by  a  document  data  processing  system. 

A.   DOCUMENT  DATA  PROCESSING 

What  is  Document  Data  Processing?   Document  Data 
Processing  is  the  procedure  of  converting  information  stored 
on  paper  to  digitized  format.   Document  data  processing 
includes  the  ability  to  electronically  store,  retrieve  and 
reproduce  the  original  information  contained  on  paper. 

Document  data  processing  systems  in  the  past  have  used 
optical  character  readers  to  convert  paper  information  to 
electronic  format.    Microfiche  or  magnetic  storage  devices 
were  used  to  store  the  electronically  converted  data. 

In  the  past,  the  high  expense  and  relative  low  capacity 
of  magnetic  media  have  precluded  its  use  for  storing 
archival  quantities  of  documents  in  other  than  character 
coded  format.  (Kapoor,  1988,  p.  28) 

Before  the  discovery  of  optical  disk,  it  was  impractical 
to  maintain  images  on-line  because  of  the  large  memory 
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requirements  of  storing  a  single  page.  (Grigsby,  1988,  p. 
62) 

With  the  advancements  in  optical  storage  media  and  the 
techniques  to  compress  images  into  manageable  sizes, 
technology  exists  today  to  design  a  document  data  processing 
system  that  will  merge  and  manage  diverse  forms  of 
information,  including  image,  text,  alphanumeric  data  and 
voice . 

B.   OPTICAL  INFORMATION  SYSTEM 

Optical  information  processing  systems  provide  both  an 
image  and  a  data  processing  solution.   These  digital  systems 
utilizing  optical  storage  media  to  store,  and  retrieve  are 
the  missing  link  in  the  integration  of  paper  documents, 
microfilm,  computer  data,  and  word  processing  text.   This 
technology  provides  solutions  not  previously  available  to 
solve  information  access  and  distribution  requirements 
associated  with  a  total  information  transaction.  (Grigsby, 
1988,  p.  60) 

Optical  information  systems  are  an  idealistic 
alternative  to  document  data  processing  systems.   Optical 
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disks    not    only    store    images    and   data,    but    also   the    retrieval 
software    for    index    data   management. 
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Figure  10    Optical  Information  System. 

A  basic  stand  alone  PC-based  optical  information 
processing  system  consists  of  an  image  scanner  to  create 
digitized  images  of  documents,  workstations  utilizing  a  bit- 
mapped high  resolution  monitor  to  display  the  images, 
magnetic  hard  disks  to  store  indexed  information  and  act  as 
a  buffer  before  storing  the  digitized  image,  optical  disks 
to  store  and  retrieve  the  images  and  a  laser  printer  to 
reproduce  the  image. 

These  stand  alone  systems  are  modular  and  can  be 
expanded  into  large,  organization  wide,  document  image  and 
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data  management  systems  with  multiple  workstations  and 
optical  disk  storage  libraries. 

When  the  optical  storage  media  is  connected  to  a  network 
several  people  can  review  the  stored  document 
simultaneously,  eliminating  the  delays  that  result  from 
passing  a  paper  file  from  person  to  person.    This  process 
allows  the  actual  paper  file  to  be  stored  in  a  low  cost, 
secure  area.   Paper  files,  which  are  subject  to  theft  and 
accidental  loss,  often  must  be  sent  out  for  review.   Each 
journey  risks  the  integrity  of  the  file  and  costs  additional 
time  and  expense. 

An  optical  storage  system  may  include  jukeboxes. 
Optical  jukeboxes  use  robotics  to  mount  and  dismount  a  large 
number  of  optical  disks.   A  jukebox  may  contain  as  many  as 
95  disks  and  up  to  five  separate  drives,  yielding  quick 
access  to  over  three   million  images.   When  an  image  is 
requested,  the  correct  optical  disk  is  robotically  selected 
and  mounted  and  the  desired  image  displayed  in  seconds. 

C.   BENEFITS  OF  OPTICAL  INFORMATION  PROCESSING 

A  primary  reason  for  using  computer  paper,  computer 
output  microfiche,  and  magnetic  tape  archival  storage  is  low 
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cost.   Until  now,  it  has  been  too  expensive  to  keep  massive 
amounts  of  data  "on-line".   Optical  information  systems 
provide  on-line  desktop  delivery  of  current,  as  well  as 
historical,  computer  information.  (Grigsby,  1988,  p.  62) 
1 .  Huge  Capacity/Space  Savings 

Dozens  of  comparisons  have  been  made  to  dramatize 
the  optical  disk  systems  ability  to  compress  volumes  of 
paper  onto  a  disk.   This  ability  provides  high-density  on- 
line storage  at  a  comparatively  low  cost  as  demonstrated  by 
the  following  table: 

Table  II  STORAGE  CAPACITY  AND  COST  COMPARISON  OF  DIFFERENT 
STORAGE  MEDIA.  (MICROSOFT  PRESS,  1986) 


MEDIA 

Floppy 
Disk 

Hard 
Disk 

Magnetic 
Tape 

CD-ROM 

WORM 

Large 
Optical 

Capacity 
(in  MB) 

.36-1.2 

5-50 

30-300 

540-680 

200-300 

1,000 
-4,000 

Cost 
per  MB 


1093.59         63.63  54.64 


2.48     17.40 


21.41 


2  .  Speed  of  Retrieval 

"In  a  current  environment,  even  in  a  hurry,  it  may 
take  10  minutes  to  retrieve  a  paper  document,  copy  it,  add 
notes,  and  fax  it  to  a  remote  office,"  says  Michael  Florio, 
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vice  president  Document  Technology,  INC,  (Dukeman,  1988,  p. 
82) .   With  an  optical  information  processing  system,  the   ^ 
process  would  take  only  30  seconds,  drastically  reducing  the 
clerical  functions  of  research,  filing  and  hard-copy 
reproduction.   Another  example:  with  an  optical  disk  jukebox 
storing  280  gigabytes,  a  document  can  be  accessed  in  7  to  10 
seconds  (Dukeman,  1988,  p. 82).  '  ' 

3.  Shared  Access/Remote  Availability 

Information  on  paper  can  be  in  only  one  place  at  a 
time,  or  it  can  be  copied  and  multiplied  beyond  control.   In 
an  optical  information  system  only  one  copy  of  an  image 
exists,  but  users  have  access  on  an  "as  needed"  basis. 

4 .  File  Integrity 

File  integrity  is  another  significant  reason  that 
automation  of  paper  documents  is  important.   Many  documents 
are  simply  lost  or  not  available  due  to  misfiling  or  out-of- 
file  situations.   Using  WORM  optical  disk  the  document 
cannot  be  misplaced  or  altered  once  it  has  been  written. 
Additionally,  when  connected  to  a  network,  several  people 
can  review  the  stored  document  simultaneously,  eliminating 
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the  delays  that  result  from  passing  a  paper  file  from  person 
to  person. 

5 .  Archival  Life 

Storage  life  of  optical  media  is  estimated  to  be 
more  than  30  years,   it  can  be  duplicated  on  new  media  at 
more  frequent  intervals.   The  life  expectancy  of  optical 
disks  far  surpasses  the  estimated  life  of  5  to  10  years  for 
magnetic  storage.   At  the  1988  AIIM  show  in  Chicago,  Sony 
Corporation  announced  that  accelerated  tests  showed  that 
Sony  WORM  media  is  capable  of  a  one-hundred-year  life 
(Dukeman,  1988,  p.  84) . 

6 .  Cross  Reference  Indexing 

Once  identified  by  a  multiple  level  cross  reference 
index,  images  can  be  retrieved  by  a  number  of  desired 
fields.   Indexing  multiple  key  fields  allows  greater 
flexibility  for  accessing  data. 

7 .  No  Head  Crashes 

Unlike  magnetic  disks,  optical  disks  do  not 
experience  head  crashes. 
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8 .  Distribution 

Coupling  optical  systems  with  a  communication 
device,  users  can  send  and  receive  documents  in  seconds.   By 
adding  a  facsimile  capability  to  a  system,  this  enables  the 
system  to  send  a  document  image  to  virtually  anywhere  there 
is  another  fax  machine. 
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V.  METHODOLOGY  AND  DATA 

A.   INTRODUCTION 

In  search  of  hardware  to  evaluate  for  an  Optical 
information  system,  the  researcher  used  three  sources  to 
acquire  data  from:   1)  In-place  operational  systems,  such  as 
the  EDS  Deers  Enrollment  Processing  Center  which  uses  a  3M 
Docutron  9000  optical  information  system  and  the  Defense 
Language  Institute  Graphics  Department  which  uses  a  Kurzweil 
4000  optical  character  recognition  system,  both  in  Monterey 
Ca . ;   2)  Companies  that  specialize  in  Optical  information 
system  integration,  such  as  TAB  Products  Co.  Palo  Alto,  Ca . , 
Anamet  Laboratories,  Inc.  Hayward,  Ca . ,  LaserData  Inc., 
Lowell,  Ma.,  and  Wang  Laboratories,  Inc.,  Lowell,  Ma.;   and 
3)  Vendors  that  sell  either  OCR  scanners  or  Image  scanners, 
such  as  Xerox  or  Western  Office  Supply  in  Santa  Clara. 

Prior  to  answering  the  research  questions,  a 
summarization  of  findings  of  both  optical  character  readers 
and  image  scanners  will  be  discussed. 
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B.   OPTICAL  CHARACTER  READER  EVALUATION 

With  the  purpose  of  fulfilling  the  need  of  converting 
thesis  documents  into  digital  format,  several  OCRs, 
including  the  latest  technology  advanced  OCRs  available  on 
the  market,  were  used  for  evaluation. 

Two  top  of  the  line  OCR  scanners  were  evaluated.   The 
Kurzweil  5000  was  demonstrated  by  Western  Office  Supplies  of 
Santa  Clara,  Ca .  and  the  Calera  CDP  3000  was  demonstrated  by 
Anament  Laboratories,  Inc.  of  Hayward,  Ca .    Both  have  self 
contained  processors  and  are  designated  as  Omni-font       -'  ' 
readers.  Both  have  Automatic  Document  Feeders  (ADF)  capable 
of  processing  50  pages  at  a  time.   With  their  built-in 
processors,  both  were  able  to  background  scan,  while 
permitting  the  PC  to  perform  other  functions. 

Additionally,  True  Scan,  also  a  product  of  Calera,  was 
demonstrated  by  Western  Office  Supplies.   True  Scan  uses  an 
image  scanner,  an  extended  RAM  board  added  to  a  PC  and 
software  to  perform  OCR/ICR.   True  Scan  does  not  have  the 
capability  to  perform  background  scanning. 

Optical  character  readers  were  originally  designed  to 
read  text  only.   Current  models  advertised  the  ability  to 
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read  a  page  and  distinguish  between  graphics  and  text.   In  a 
sense  this  was  true,  distinguishing  several  fonts  of  text 
and  ignoring  any  form  of  graphics.   Selection  to  scan  either 
in  an  image  mode  or  text  mode  had  to  be  made  prior  to 
scanning. 

If  a  single  page  contained  both  text  and  graphics,  the 
page  would  need  to  be  scanned  once  for  text  and  once  for 
graphics  and  then  to  obtain  a  digitized  copy  of  the  original 
page,  the  text  and  the  bit  map  image  of  graphics  would  need 
to  be  merged. 

This  process  may  work  well  in  an  office  environment 
where  only  a  few  documents  a  day  might  be  digitized.  But 
when  converting  large  document  databases,  the  time  to 
preview  each  page  prior  to  scanning,  scan  the  page  twice  if 
needed,  and  merge  the  text  with  graphics  would  be  too  time 
intensive  to  be  practical  for  a  large  conversion  project. 

Optical  character  recognition  is  still  not  an  exact 
science.   It  is  the  opinion  of  the  researcher  that  the 
recognition  capability  of  todays  models  is  vastly  improved 
over  earlier  models,  but  there  were  still  numerous  errors 
made  by  all  models  previewed. 
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Newer  models  can  now  distinguish  text  printed  in  columns 
and  around  graphics  but  still  have  great  difficulty  with 
formats.   A  full  page  of  text  from  a  thesis  averaged  2  to  3 
character  recognition  errors.   Recognition  errors  were  easy 
to  correct,  but  it  was  the  experience  of  the  researcher, 
that  the  time  required  to  correct  format  errors  was  more 
intensive . 

Appendix  A  contains  pages  from  the  original  thesis  used 
for  research.   Appendix  B  contains  the  unedited  results 
using  the  same  pages  in  Appendix  A  and  scanned  with  a 
Kurzweil  5000.   The  Kurzweil  5000  did  an  excellent  job  of 
reading  text,  such  as  the  small  print  on  page  1  of  the 
thesis  document  DD  Form  1473.   But  it  illustrates  how  time 
intensive  it  would  be  to  correct  the  recognition  errors  and 
to  reformat  the  page  in  an  acceptable  form  for  permanent 
storage . 

C.   IM21GE  SCANNER  EVALUATION 

Several  image  scanner  models  were  reviewed.   The  only 
noticeable  difference  between  the  various  models  as 
illustrated  by  Table  III  was  the  amount  of  time  it  took  to 
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scan  a  standard  8  1/2  x  11  page.   The  resolution  and 
grayscale  quality  was  comparable  among  all  models. 

The  Fujitsu  Model  M3094E  was  used  by  TAB  Products  and 
Anamet  Laboratories  in  the  integration  of  their  optical 
information  systems.   The  Fujitsu  model  was  rated  fastest 
among  several  commercially  available  models,  averaging  7 
seconds  per  page  at  200  PPI  resolution. 
Table  III  TIME  COMPARISON  REQUIRED  TO  SCAN  A  SINGLE  PAGE. 

SCANNER         TIME  TO  SCAN  A  SINGLE  PAGE 

(resolution  =  200  PPI) 

Hybrid  <  1  sec 

(3M  Docutron  9000  system) 

Fujitsu  M3094E  7  sec 

Microtek  MS  300A  24  sec 

XEROX  7650  31  sec 

D.       RESEARCH    QUESTIONS 

Utilizing   an    original    copy    of   a   thesis,    and   the    optical 
character   recognition    and    image    scanners    discussed   above, 
the    following   questions   were   addressed: 
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1 .  What:  is  the  "time  required  to  convert  text  to  data? 

a.  Optical    Character  Recognition    scanners 

Time  required  to  convert  text  to  data  depended 
upon  the  processor  of  the  individual  scanner  selected.   The 
scanners  with  the  greater  built-in  processing  capability, 
such  as  the  Kurzweil  5000  or  the  Calera  CDP  3000  averaged  30 
seconds  or  less  per  page.   Using  the  True  Scan  board 
attached  to  an  Image  scanner,  the  average  time  was  60 
seconds  per  page.   The  Kurzweil  4000,  using  1985  technology 
averaged  90  seconds  per  page. 

Times  mentioned  do  not  include  the  time  required 
to  correct  the  errors,  nor  the  time  it  would  take  to  rescan 
the  page  as  graphics  and  attempt  to  combine  the  two.   Both 
error  correction  and  combining  graphics  could  take  up  to  an 
additional  5  minutes  per  page  dependent  upon  the  number  of 
errors  and  format  of  the  graphics. 

b.  Image   Scanners 

Time  required  to  convert  an  image  to  digital 
format  was  dependent  upon  the  individual  processor  tested 
and  the  resolution  selected. 

Times  for  the  different  processors  ranged  from 
less  than  1  second  per  page  (scanning  both  sides)  for  the 
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hybrid  scanner  designed  for  the  3M  Docutron  9000  system  to 
31  sec  per  page  for  the  XEROX  7650  image  scanner. 
Comparisons  were  based  on  scanning  at  200  PPI.   (See  Table 
III) 

Decreasing  or  increasing  the  resolution  had  the 
same  effect  on  time  to  scan.   For  the  XEROX  7650  decreasing 
the  resolution  to  75  PPI  decreased  the  time  to  scan  to  14 
seconds.   Increasing  to  300  PPI  required  38  seconds  and 
increasing  to  400  PPI  required  154  seconds.   The  same  page 
was  used  for  the  above  time  analysis,  which  demonstrates  the 
increased  time  required  when  increasing  scanning  resolution. 
2.  How  accurate  is  converting  to  digital  format? 

If  the  document  was  strictly  text  the  accuracy  rate 
was  quite  high  for  OCRs  scanners.   They  rarely  had  more  than 
2  or  3  errors  a  page,  but  if  there  was  any  form  of  graphics 
such  as  figures  or  tables,  the  error  rate  went  up 
drastically . 

For  Image  scanners,  the  accuracy  is  a  matter  of 
resolution.   For  a  resolution  of  75  or  100  PPI,  the  quality 
was  good  but  generally  not  as  good  as  the  original,  with 
some  of  the  smaller  details  harder  to  read.   200  PPI 
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resolution  was  just  as  good  or  better  than  the  original. 
300  or  400  PPI  resolution  produced  a  product  that  was  much 
better  than  the  original .   Appendix  C  demonstrates  the 
quality  difference  between  pages  scanned  at  200  and  300  PPI. 
3.  What  are  the  digital  storage  requirements? 

Text  scanned  by  optical  recognition  readers  required 
very  little  storage  space  as  compared  to  image  scanned  text. 
The  entire  8  pages  read  by  the  Kurzweil  5000  in  Appendix  B, 
required  only  15,171  bytes  to  store.  "  ' 

Image  scanning  requires  vastly  larger  amounts  of 
memory.   An  uncompressed  page  scanned  at  300  PPI  requires  an 
approximate  1  megabyte  of  memory.   A  compressed  page  at  300 
PPI  still  requires  approximately  40,000  bytes  of  storage 
space.    Table  IV  provides  a  sample  of  the  compressed  file 
sizes  required  for  individual  pages  scanned  in  Appendix  A. 

Table  IV  HLE  SIZES  REQUIRED  FOR  INDIVIDUALLY  SCANNED 
PAGES. 

Page  #  200  PPI  300  PPI 


Cover 

13,717 

20,588 

1 

50,405 

76,600 

4 

24,207 

36.361 

17 

15,526 

23,834 

51 


VI.  ANALYSIS 

A.   THESIS  DOCUMENT  STORAGE  REQUIREMENTS 

The  area  in  the  Naval  Postgraduate  School's  Knox  library 
where  the  thesis  documents  are  stored,  is  referred  to  as  the 
thesis  cage.   It  is  so  named  because  of  the  wire  walls  that 
enclose  the  area  to  control  access. 

The  thesis  cage  comprises  an  area  of  23  feet  by  21  feet 
and  a  ceiling  height  of  8  feet.   This  small  area,  utilizing 
compact  shelving,  contains  approximately  23,500  theses.  For 
each  thesis  title  stored,  there  is  one  hard  bound  and  one 
soft  bound  document.   Therefore,  there  are  approximately 
11,750  original  thesis  documents  dating  back  to  the  early 
50' s.   Each  quarter  there  is  an  additional  200  to  250  new 
theses  produced.   Adding  approximately  1000  new  thesis 
documents  annually. 

Selecting  20  theses  documents  at  random,  the  average 

size  of  a  thesis  was  108.3  pages  in  length,  of  which  27.9 

pages  were  graphs  or  charts  and  7.4  pages  were  pictures.   It 

is  important  to  note  that  each  thesis  contained 

approximately  25  percent  graphics  of  some  form.   For  this 

reason  OCR  was  not  considered  a  viable  alternate  for 
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converting  the  thesis  documents  and  therefore  will  not  be 
considered  in  the  continuation  of  the  analysis. 

For  a  four  month  period  from  September  to  December  1988, 
the  thesis  cage  was  used,  on  the  average,  4.2  times  per  day. 
(The  above  average  did  not  include  Sundays  and  school  breaks 
between  quarters  when  the  library  was  not  being  utilized  to 
its  fullest  capacity) .   With  this  in  mind,  one  optical  disk 
processing  work  station  would  be  more  than  adequate  to 
fulfill  the  needs  for  search  and  retrieval. 

Available  on  the  market  today  are  1.6  gigabyte  per  side 
12  inch  optical  WORM  disks,  with  two  gigabytes  per  side  and 
greater  being  evaluated  for  market  introduction. 

Using  12  inch  optical  disks  and  a  recommended  200  PPI 
scanning  resolution  to  replace  the  11,750  thesis  documents, 
it  would  take  approximately  33  gigabytes  or  10.3  optical 
disks  to  store  all  theses  currently  in  the  cage.   1000  new 
thesis  documents  would  require  2,8  gigabytes  or 
approximately  one  new  12  inch  disk  each  year. 
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B.   COST  ANALYSIS 

1 .  Optical  information  system  cost  analysis . 

A  system  that  fulfills  the  requirement  for  an 
optical  information  system  for  the  library  is  the  TAB  Laser- 
Optic  Filing  System  2000.   A  single  desk  measuring  eight 
feet  in  length  contains  the  complete  system.   The  system 
integrates  the  following  components;  one  image  scanner,  one 
high  resolution  monitor,  one  cpu  and  hard  disk,  one  12  inch 
optical  disk  drive,  and  a  laser  printer. 

The  cost  of  the  TAB  Laser-Optic  Filing  System  2000 
with  the  12  optical  disk  drive  is  $69,950.   Each  12  inch 
optical  disk  costs  $575.   Initial  purchase  would  require  11 
optical  disks  to  store  the  entire  thesis  library,  plus  two 
additional  disks  to  cover  the  first  two  years  of  expected 
additional  thesis  documents.   Technology  is  expected  to 
increase  storage  capacity  of  the  12  inch  optical  disk,  so 
more  than  two  years  in  advance  purchase  is  not  recommended. 
The  cost  of  purchasing  the  13  optical  disks  is  $7475. 
Therefore,  the  initial  hardware/software  cost  to  implement 
the  system  is  $77,125. 
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2 .  Hard-copy  to  digital  conversion  cost  analysis . 

Conversion  cost  is  determined  by  time  and  cost  for 
an  individual  to  complete  the  conversion. 

Conversion  time  consists  of  1)  the  time  to  prepare 
the  document  for  scanning,  i.e.,  such  as  removing  staples, 
2)  the  time  to  scan  the  document,  3)  the  time  to  index  the 
document  for  storage  on  the  optical  disk,  and  4)  the  time 
required  to  actually  store  the  document  on  the  disk. 

The  image  scanner  of  the  TAB  Laser-Optic  Filing 
System  2000  can  scan  a  page  in  an  average  7  sec  at  200  PPI. 
For  an  average  thesis  document  of  108.3  pages,  it  would  take 
approximately  12,6  minutes  for  each  document.   Add 
approximately  5  minutes  to  prepare  each  thesis  document,  2 
minutes  to  index  each  thesis  document  and  less  than  a  minute 
to  store  each  thesis  document  to  an  optical  disk,  it  would 
take  a  total  of  approximately  20  minutes  to  prepare,  scan, 
index  and  store  each  thesis  document. 

Scanning  1  thesis  document  every  20  minutes  equals 
24  thesis  documents  scanned  in  an  eight  hour  day.   Assigning 
one  individual  full  time,  it  would  take  490  days  or  98  work 
weeks  to  convert  the  entire  library  of  11,750  thesis 
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documents.   Assuming  an  individual  hired  as  a  GS-3  to 
perform  the  conversion,  with  an  approximate  annual  salary 
and  benefits  worth  $13,800,  it  would  require  a  total  of 
$26,000  to  complete  the  initial  task. 

Additional  conversion  of  200  -  250  new  thesis 
documents  each  quarter  would  require  an  individual  for  ten 
working  days  per  quarter.   Assuming  the  same  salary 
requirements,  it  would  cost  an  approximate  $2120  per  year 
for  converting  new  documents. 

3.  Siaxnmary  of  Cost  Analysis. 

The  total  cost  to  initially  implement  an  optical 
information  system  is  $103, 125,  the  cost  of  the  system  and 
disks  -  $77,125,  plus  the  initial  cost  of  converting 
currently  stored  documents,  $26,000.   Then  to  continue  to 
convert  documents  as  they  arrive,  would  cost  an  additional 
$4,240  for  the  first  two  years. 
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VII.  CONCLUSIONS,  RECOMMENDATIONS 

A.   CONCLUSIONS 

The  design  and  implementation  of  a  hard-copy  to  digital 
format  optical  information  system  has  the  potential  of 
solving  storage  capacity  problems  not  only  for  the  NPS  Knox 
library,  but  also  for  other  document  archive  facilities  both 
in  the  Department  of  Defense  and  other  governmental  and 
civilian  agencies. 

The  technology  to  convert  hard-copy  documents  into 
digital  format  is  readily  available  today.   The  image 
scanning  optical  information  system  converts,  stores  and 
retrieves  documents  in  a  matter  of  seconds. 

So  the  issue  to  determine  whether  or  not  to  convert 
hard-copy  technical  documents  into  digital  format  is 
strictly  cost.   The  cost  of  hardware  and  software  to 
implement  the  system,  the  initial  cost  of  converting 
currently  stored  documents,  the  cost  to  convert  documents  as 
they  arrive,  and  finally,  the  cost  of  maintaining  the  system 
once  it's  on-line.   All  these  costs  must  then  be  traded  off 
for  benefits  in  the  form  of  space  made  available  for  other 
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uses,  faster  search  and  retrieval  times,  and  an  overall 

increase  in  use  due  to  easier  accessibility. 

When  reviewing  the  performance  of  imaging  systems  in 
government,  one  can  develop  a  cost  justification  based 
on  an  agency's  savings  in  information  processing  costs 
and  storing  of  paper.   But  perhaps  the  true  bottom  line 
should  be  measured  in  terms  of  service  delivered  to  the 
public.  (Levy,  1988,  p.  6) 

B .   RECOMMENDATIONS 

During  the  researcher's  analysis  of  the  thesis  cage,  it 
was  noted  that  two  documents  for  each  thesis  existed.   One 
hard  bound  copy  and  one  soft  bound  copy.   To  save  space 
immediately,  the  researcher  recommends  removing  the  soft 
bound  documents  for  storage  elsewhere.   This  would  free  50 
percent  of  the  space  in  the  thesis  cage.   The  hard  bound 
thesis  documents  could  then  be  treated  as  any  other  text  in 
the  library,  being  recalled  if  another  individual  needs  to 
review  a  checked  out  thesis. 

At  the  same  time  or  in  the  future,  if  the  decision  is 
made  to  convert  to  an  optical  information  system,  the  soft 
bound  thesis  documents  could  be  used  for  scanning  without 
interrupting  the  current  storage  system. 
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If  the  decision  to  save  much  needed  floor  space  in  the 
library  or  build  now,  purchasing  an  imaging  optical 
information  system  is  a  highly  recommended  alternative.   Not 
only  to  convert  thesis  documents,  but  the  system  could  be 
expanded  to  convert  other  texts  in  the  library  as  well. 

To  reduce  the  cost  of  implementing  an  optical 
information  system,  the  researcher  recommends  a  follow-on 
thesis,  researching  and  building  an  in-house  imaging  optical 
information  system. 

A  question  for  consideration  for  a  follow-on  thesis, 
would  be  the  feasibility  of  scanning  graphics  and  combining 
with  text  during  thesis  preparation.   This  would  reduce  the 
cut  and  paste  that  is  currently  done,  reduce  the  overall 
storage  requirements  of  a  thesis,  and  eliminate  the  need  to 
scan  future  theses.   The  final  digitized  copy  of  the  thesis 
document  could  then  be  forwarded  to  the  library  and 
distributed  to  other  government  agencies  at  a  lesser  cost. 
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APPENDIX   A 
ORIGINAL   PAGES    USED    FOR   SCANNING   RESEARCH 

Appendix  A   contains    the   original   pages    from   the    sample 
thesis   document   used   for   scanning   research. 
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One  of  the  3lgnlflcaLnt  problems  of  this 
"Information  age"  Is  the  production  of  vast  amounts  of 
Information  In  a  form  that  Is  neither  convenient  nor 
coat  effective.   This  Information  Is  most  often 
produced  and  distributed  on  paper  and  the  resultant 
effort  In  production,  distribution  and  retrieval  Is 
herculean.   A  possible  solution  to  this.  Is  the  new 
optical  laser  technology  and  Its  use  In  the  storage  and 
retrieval  of  large  aunounts  of  Information.   Through  the 
use  of  this  technology  In  the  non-classlf led  areas  of 
the  Department  of  Defense  the  effort  In  all  three  areas 
cam.  be  greatly  reduced  and  the  end  user  can  become  more 
efficient.   In  many  areas  of  DOD,  the  greatest  benefit 
would  be  the  regained  space  and  weight  associated  with 
the  distribution  of  the  manuals  and  other  typically 
paper  products  on  a  Compact  Disc  -  Read  Only  Memory 
(CD-ROM).   One  CD-ROM  weighs  less  than  an  ounce  and  Is 
capable  of  storing  over  270,000  pages  of  text.   The 
saved  shipping  and  handling  costs  alone  would  be 
astronomically  reduced  not  to  mention  the  end  user  who 
would  have  a  more  effective  and  efficient  product.   The 
CD-ROM  Is  designed  to  work  as  a  peripheral  device  to  a 
microcomputer  and  can  therefore  be  made  available  to 
any  user  with  an  IBM  compatible  microcomputer.   The 
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I.  INTl^ODTTCTION/BACXGROTIND 

A-       GENERAL 

■  Tlie  information  age  Is  upon  ua .   It  was  reported 
that  In  1985  the  number  of  pages  of  printouts  exceeded 
2,000  for  every  man,  woman,  and  child  In  America. 
[Ref.  l]   What  will  we  do,  especially  In  the  military, 
to  meet  tiiia  acw  era  with  the  resources  at  hajid?   We 
cannot  afford  to  he  left  behind,  whether  by  technology 
or  techniques.   Some  contemporaries  have  described  It 
as  an  Information  explosion,  and  yet.  an  explosion  Is  a 
singular,  albeit  powerful,  event.   The  groxind  swell  of 
this  event  is  better  described  as  a  snowball  rolled 
from  the  peak  of  the  highest  mountain.   As  it  tumbles 
downward,  it  continues  to  Increase  It's  momentum  as  it 
picks  up  more  snow  and  velocity  along  it's  path.    From 
our  vantage  point,  the  slope  is  infinite,  and  although 
minor  obstacles  may  be  met  along  the  way,  it  will 
continue  on  and  on. 

B.   OBJECTIVE 

The  objective  of  this  thesis  is  actually  three- 
fold.  First,  the  current  technological  capabilities  In 
the  area  of  optical  laser  research,  as  they  apply  to 
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not  without  cost  and  a  sponsor  was  sought.   The 
purchase  of  a  CD-ROM  disc  drive,  associated  hardware 
and  software  and  the  cost  of  the  services  to  Index  and 
master  the  discs,  were  the  major  costs.   Naval  Supply 
Systems  Command  In  Washington,  D.C.  Identified  a  need, 
a  prototype  application  aind  provided  support  funding 
for  this  project.   The  hardware  was  purchased  and  the 
complicated  process  of  data  formatting  and  tr&nafer  to 
disc  was  accomplished.   The  actual  object  of  the 
resultant  demons tr2.t ion  was  a  portion  of  an  extremely 
large  database  consisting  of  over  3  gigabytes  (gbytes) 
of  Information  composed  of  over  12  million  total 
records.   The  prototype  application  dealt  with 
approximately  360  Mbytes  and   slightly  more  than  2 
million  records.   Although  a  single  CC-ROM  can  hold  up 
to  540  MB,  the  total  quantity  of  actual  data  held  on  a 
disc  Is  often  much  less  due  to  the  Indexing 
requirements.   Sophisticated  Indexing  schemes  can  even 
require  more  space  than  the  data  Itself.   An  example  of 
this  Is  Groller  Electronic  Encyclopedia  which  requires 
60  MB  to  accommodate  the  actual  text  of  the 
encyclopedia  and  50  MB  to  accommodate  the  sophisticated 
Index.  (See  Figure  1) 

The  desired  result  of  the  research  was  to  free  up 
the  two  large  Transaction  Ledger  On  Disc  (TLOD)  disc 
packs  each  containing  approximately  540  Mbytes  of  data 
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adjacent  nonref lectlve  pits  Is  called  a  land  and  can 
also  vary  In  Its  representation  of  data  from  2  to  many 
bits.   In  CD-SOM  coding,  a  binary  one  Is  represented  by 
the  transition  from  pit  to  land  and  land  to  pit.  and  2 
or  more  zeros  are  represented  by  the  dlstajice  between 
transitions.   (See  Figure  4)   The  resultant  series 
lands  and  grooves  are  ultimately  Interpreted  as  one's 
and  zero's  and  thus  a  wide  variety  of  digitally  encoded 
Information  can  be  stored  on  disc.   When  "reading"  an 
optical  disc,  a  low-po'A'er  laser,  senses,  the  presence 
or  absence  of  the  latnds  amd  grooves  by  means  of 
reflected  light  energy.   The  small  laser  beajn  used  to 
read  back  data  Is  reflected  from  the  lands,  and 
scattered  by  the  pits. 

Of  the  prerecorded  discs,  the  CD-ROM  Is  the  most 
common  and  draws  heavily  on  It's  predecessor,  the  CD- 
Audio  Disc,  for  format,  wide  acceptance  and 
manufacturing  facilities.   The  recording  format  Is  a 
spiral  groove  approximately  3  miles  long  with  a 
capacity  of  540  MB.   The  tracking  Is  maintained  via  the 
constant  linear  velocity  ( CLV )  technique  which  requires 
variation  of  the  disc  rotation  speed  based  on  the 
distance  of  the  read  head  form  the  center  of  the  disc. 
The  prerecorded  disc  Is  4.72  Inches  In  diameter  and 
It's  uses  are  primarily  In  the  area  of  database 
distribution  and  permanent  archival  of  vast  amounts  of 
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records.   The  other  type  of  prerecorded  optical  disc  is 
the  Optical  Read  Only  Memory  { OROM )  wnich  is  slightly 
larger  than  the  CD-ROM.   The  OROM  discs  are  generally 
5.24  inches  in  diameter  and  may  be  formatted  with 
either  concentric  or  spiral  tracks.   Although  the 
capacity  of  the  discs  is  very  similar,  OROM  is  often 
operated  In  a  constant  angular  velocity  ( CAV )  mode, 
thus  allowing  for  faster  access  times.   The  typical  5 
1/4"  floppy  disc  used  the  CAV  technique.   Also,  OROM 
may  be  two  sided.   The  predominance  of  the  CD-ROM  is 
most  probably  due  to  its  similarity  to  the  large  CD- 
Audio  market,  and  the  fact  that  CD-ROM  is  the  only  form 
of  optical-recording  that,  as  of  this  writing,  has  an 
established  stajidard.  The  OROM  is  not  expected  to  make 
a  significant  impact  in  the  near  future  and  Indeed  may 
be  subsumed  by  the  more  dominant  forms  of  optical- 
recording.   OROM  will  therefore  not  be  further 
addressed  in  this  paper. 

Of  the  two  types  of  recordable  discs,  the  WORM 
generally  uses  the  CAV  technique  and  the  erasable  disc 
technology  is  curren 

tly  experimenting-  with  both  techniques  without  a  clear 
winner  yet  identified.   Some  of  the  varying  physical 
characteristics  can  be  seen  In  the  following  table. 
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APPENDIX  B 
RESULTS  USING  AN  OPTICAL  CHARACTER  READER 

This  appendix  includes  the  pages  contained  in  Appendix 
A,  scanned  through  a  Kurzweil  5000  Intelligent  Charater 
Recognition  scanner.   These  pages  are  unedited  to  show 
character  recognition  and  formatting  errors.   Page  breaks 
were  entered  to  help  clarify  the  text  that  was  scanned. 
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I .   INTRODUCTION/BACKGROUND 

A. GENERAL 

The  information  age  is  upon  us.  It  was  reported  that  in 
1985  the  number  of  pages  of  printouts  exceeded  2,000  for 
every  man,  woman,  and  child  in  America.  [Ref.  1]  What  will 
we  do,  especially  in  the  military,  to-  meet  this  new  era 
with  the  resources  at  hand?  We  cannot  afford  to  be  left 
behind,  whether-  by  technology  or  techniques.  Some 
contemporaries  have-  described  it  as  an  information 
explosion,  and  yet,  an  explosion  is  a  singular,  albeit 
powerful,  event.  The  ground  swell  of  this  event  is  better 
described  a-  a  snowball  rolled  from  the  peak  of  the  highest 
mountain.  As  it  tumbles  downward,  it  continues  to  increase 
it's  momentum  as  it  picks  up  more  snow  and  velocity  along 
it's  path.   From  our  vantage  point,  the  slope  is  infinite, 

and  although  minor  obstacles  may be  met  along  the  way,  it 

will  continue  on  and  on. 

B. OBJECTIVE 

The  objective  of  this  thesis  is  actually  threefold.  First, 
the  current  technological  capabilities  in  the  area  of 
optical  laser  research,  as  they  apply  to 
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not  without  cost  and  a  sponsor  was  sought.  The  purchase  of 
a  CD-ROM  disc  drive,  associated  hardware  and  software  and 
the  cost  of  the  services  to  index  and  master  the  discs,  were 
the  major  costs.  Naval  Supply  Systems  Command  in 
Washington,  D.C.  identified  a  need,  a  prototype  application 
and  provided  support  funding  for  this  project.  'The 
hardware  was  purchased  and  the  complicated  process  of  data 
formatting  and  transfer  to  disc  was  accomplished.  The 
actual  object  of  the  resultant  demonstration  was  a  portion 
of  an  extremely  large  database  consisting  of  over  3 
gigabytes  (gbytes)  of  information  composed  of  over  12 
million  total  records.  The  prototype  application  dealt 
with-approximately  360  Mbytes  and  slightly  more  than  2 
million  records.  Although  a  single  CD-ROM  can  hold  up  to 
540  MB,  tee  total  quantity  of  actual  data  held  on  a  disc  is 
often  much  less  due  to  the  indexing  requirements. 
Sophisticated  indexing  schemes  can  even  require  more  space 
than  the  data  itself.  An  example  of  this  is  Grolier 
Electronic  Encyclopedia  which  requires  60  MB  to  accommodate 
the  actual  text  of  the  encyclopedia  and  SO  MB  to  accommodate 
the  sophisticated  index.  (See  Figure  1) 

The  desired  result  of  the  research  was  to  free  up  the  two 
large  Transaction  Ledger  On  Disc  (TLOD)  disc  packs  each 
containing  approximately  540  Mbytes  of  data 
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adjacent  nonref lective  pits  is  called  a  land  and  can  also 
vary  in  its  representation  of  data  from  2  to  many  bits.  In 
CD-ROM  coding,  a  binary  one  is  represented  by  the  transition 
from  pit  to  land  and  land  to  pit,  and  2  or  more  zeros  are 
represented  by  the  distance  between  transitions.  (See 
Figure  4)  The  resultant  series  lands  and  grooves  are 
ultimately  interpreted  as  one's  and  zero's  and  thus  a  wide 
variety  of  digitally  encoded  information  can  be  stored  on 
disc.  When  "reading"  an  optical  disc,  a  low-power  laser, - 
senses,  the  presence  or  absence  of  the  lands  and  grooves  by 
means  of  reflected  light  energy.  The  small  laser  beam  used 
to  read  back  data  is  reflected  from  the  lands,  and  scattered 
by  the  pits. 

Of  the  prerecorded  discs,  the  CD-ROM  is  the  most  common  and 
draws  heavily  on  it's  predecessor,  the  CDAudio  Disc,  for 
format,  wide  acceptance  and  manufacturing  facilities.  The 
recording  format  is  a  spiral  groove  approximately  3  miles 
long  with  a  capacity  of  540  MB..  The  tracking  is  maintained 
via  the  constant  linear  velocity  (CLV)  technique  which 
requires  variation  of  the  disc  rotation-,  speed,  based  on- 
the  distance  of  the  read  head  form  the  center  of  the  disc. 
The  prerecorded  disc  is  4.72  inches  in  diameter  and  it's 
uses  are  primarily  in  the  area  of  database  distribution  and 
permanent  archival  of  vast  amounts  of 

18 
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records.   The  other  type  of  prerecorded  optical  disc  isthe 
Optical  Read  Only  Memory  (OROM)  which  is  slightly  larger 
than  the  CD-. ROM.   The  OROM  discs  are  generally  5.24  inches 
in  diameter  and  may  be  formatted  with  either  concentric  or 
spiral  tracks.   Although  the  capacity  of  the  discs  is  very 
similar,  OROM  is  often  operated  in  a  constant  angular 
velocity  (CAV)  mode,  thus  allowing  for  faster  access  times. 
The  typical  5  1/4"  floppy  disc  used  the  CAV  technique. 
Also,  OROM  may  be  two  sided.   The.  predominance  of  the  CD- 
ROM  is  most  probably  due  to  its-  similarity  to  the  large 
CDAudio  market, .  and  the  fact  that  CD-ROM  is  the  only  form 
of  optical-recording  that,  as  of  this  writing,  has  an 
-  established  standard.  The  OROM  is  not  expected  to  make  a 
significant 

impact  in  the  near  future  and  indeed  may  be  subsumed  by  the 
more  dominant  forms  of  opticalrecording.   OROM  will 
therefore  not  be  further  addressed  in  this  paper. 
Of  the  two  types  of  recordable  discs,  the  WORM  generally 
uses  the  CAV  technique  and  the  erasable  disc  technology  is 
curren  tly  experimenting'-  with  both  techniques  without  a- 
clear  winner  yet  identified.   Some  of  the  varying  physical 
characteristics  can  be  seen  in  the  following  table. 
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APPENDIX  C 
RESULTS  USING  AN  IMAGE  SCANNER 

Appendix  C  contains  the  cover  page,  pages  1,    4,  and  17 
contained  in  Appendix  A.   The  first  four  pages  were  scanned 
at  200  and  the  next  four  pages  were  scanned  at  300  PPI. 
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ABSn?ACT 

One  of  the  significant  problems  of  this 
"Information  age"  Is  the  production  of  vast  amounts  of 
Information  In  a  form  that  Is  neither  convenient  nor 
cost  effective.   This  Information  Is  most  often 
produced  and  dlstrlhuted  on  paper  and  the  resultant 
effort  In  production,  distribution  and  retrieval  is 
herculeaji.   A  possible  solution  to  this,  Is  the  nev 
optical  laser  technology  and  its  use  in  the  storage  and 
retrieval  of  large  amounts  of  Information.   Through  the 
use  of  this  technology  In  the  non-classlf led  areas  of 
the  Department  of  Defense  the  effort  in  all  three  areas 
can  he  greatly  reduced  and  the  end  user  can  become  more 
efficient.   In  many  areas  of  DOD,  the  greatest  benefit 
vould  be  the  regained  space  and  veight  associated  vlth 
the  distribution  of  the  meaiuals  and  other  typically 
paper  products  on  a  Compact  Disc  -  Read  Only  Memory 
(CD-ROM).   One  CD-ROM  velghs  less  than  an  ounce  and  Is 
capable  of  storing  over  270,000  pages  of  text.   The 
saved  shipping  and  handling  costs  alone  vould  be 
astronomically  reduced  not  to  mention  the  end  user  who 
would  have  a  more  effective  and  efficient  product.   The 
CD-ROM  Is  designed  to  work  as  a  peripheral  device  to  a 
microcomputer  and  can  therefore  be  made  available  to 
any  user  with  an  IBM  compatible  microcomputer.   The 
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ABSTRACT 

One  of  the  significant  problems  of  this 
"Information  age"  is  the  production  of  vast  amounts  of 
information  in  a  form  that  is  neither  convenient  nor 
cost  effective.   This  Information  is  most  often 
produced  and   distributed  on  paper  and  the  resultant 
effort  in  production,  distribution  and  retrieval  is 
herculean.   A  possible  solution  to  this.  Is  the  nsv 
optical  laser  technology  and  its  use  in  the  storage  and 
retrieval  of  large  amounts  of  information.   Through  the 
use  of  this  technology  in  the  non-classified  areas  of 
the  Department  of  Defense  the  effort  in  all  three  areas 
can  be  greatly  reduced  and  the  end  user  can  become  more 
efficient.   In  many  areas  of  DOD.  the  greatest  benefit 
vould  be  the  regained  space  and  veight  associated  vith 
the  distribution  of  the  manuals  and  other  typically 
paper  products  on  a  Compact  Disc  -  Read  Only  Memory 
(CD-ROM).   One  CD-ROM  veighs  less  than  an  oxince  and  is 
capable  of  storing  over  270.000  pages  of  text.   The 
saved  shipping  and  handling  costs  alone  vould  be 
astronomically  reduced  not  to  mention  the  end  user  who 
would  have  a  more  effective  and  efficient  product.   The 
CD-ROM  is  designed  to  work  as  a  peripheral  device  to  a 
microcomputer  and  can  therefore  be  made  available  to 
any  user  with  an  IBM  compatible  microcomputer.   The 
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GLOSSARY 


Analog — Analog  data  is  a  representation  of  information  by  a 
signal  that  varies  in  proportion  to  the  amount  of  the 
original  information.   Thus,  the  size  of  a  signal,  such  as 
light,  is  expressed  by  another  signal,  an  electrical 
voltage,  that  is  proportional  to  the  amount  of  light 
reflected. 

ANSI--American  National  Standards  Institute 

Application  Development --Customer  software  developed 
according  to  the  user's  specification  that  can  include  user 
interface,  data  presentation  and  integration  of  the 
information  product  into  existing  applications. 

ASCII--American  Standard  Code  for  Information  Interchange. 
It  is  the  standard  table  of  7-bit  digital  representations 
used  to  transmit  information  to  a  printer,  other  computers, 
or  other  peripheral  devices . 

Binary--Binary  data  is  a  representation  of  numerical 
information  that  uses  only  two  expressions.   These  are, 
numerically,  the  digits  "1"  and  "0"  or,  electronically,  "on' 
and  "off."   Thus,  the  on/off  representation  allows 
electronic  storage  and  manipulation  of  the  information. 


Bit--Binary  digit.  The  smallest  part  of  information  in 
binary  notation.  A  bit  is  written  as  either  1  or  0  and 
represents  either  the  on  or  off  variation  of  voltage. 

Board--A  printed-circuit  board,  or  card,  that  mounts  onto 
the  physical  chassis  of  a  computer  or  peripheral  and  holds 
the  chips  and  associated,  wiring.   Other  cards  may  be 
plugged  into  this  board. 

BPI — Bits  per  inch  is  usually  used  to  describe  the 
electronic  representation  on  a  video  screen;  a  bit  is 
frequently  equivalent  to  a  pixel. 
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Buffer--An  auxiliary  storage  area  for  data.   Many 
peripherals  have  buffers  used  to  temporarily  store  data  that 
will  be  used  as  time  permits. 

Byte--A  group  of  eight  bits  of  digital  data  which  is 
processed  together.   A  byte  can  have  256  (or  28)  possible 
combinations  of  8  binary  digits. 

CAV--Constant  Angular  Velocity.   A  technique  that  spins  a 
disc  at  a  constant  speed,  resulting  in  the  inner  disc  tracks 
passing  the  read/write  head  more  slowly  that  the  outer 
tracks.   this  results  in  numerous  tracks  forming  concentric 
circles  with  the  storage  density  being  the  greatest  on  the 
inner  track.  (See  also  CLV) . 

CCD — Charged  Coupled  Device  is  a  device  composed  of  a  row  of 
several  thousand  small  photocells.   Each  pixel  on  the  output 
image  corresponds  to  a  photocell.  A  CCD  is  actually  a  one- 
chip  microcircuit . 

CCITT- -Acronym  for  the  French  name  of  the  Consultive 
Committee  on  International  Telephone  and  Telegraph.  CCITT 
issues  the  standards  for  data  compression  techniques  such  as 
CCITT  Group  3. 

CD — Compact  Disc  -  See  CD-ROM 

GDI — Compact  Disc  Interactive.   Physically  identical  to  the 
CD-ROM  disc,  however,  with  emphasis  on  the  interactive 
presentation  of  video,  audio,  text  and  data.   A  self- 
contained  multimedia  system  expected  to  operate  in 
conjunction  with  home  entertainment  equipment. 

CD  ROM — See  CD-ROM 

CDROM — See  CD-ROM 

CD-ROM--Compact  Disc  -  Read  Only  Memory.   A  computer 
peripheral  capable  of  storing  large  amounts  of  data  which 
are  placed  on  the  disc  at  the  time  of  manufacture. 

Checksum — A  method  of  checking  the  accuracy  of  a  character 
transmitted,  manipulated,  or  stored.   The  checksum  is  the 
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result  of  the  summation  of  all  the  digits  involved.   Used 
for  error  detection  vice  error  correction. 

Chip--The  term  applied  to  an  integrated  circuit  that 
contains  many  electronic  circuits.   A  chip  is  sometimes 
called  an  IC  or  an  IC  chip.   The  name  is  occasionally 
applied  to  the  entire  integrated  circuit  package. 

CIRC — Cross-Interleaved  Reed-Solomon  Code.   The  only  error 
correction  scheme  used  with  CD  Audio,  and  the  first  layer 
used  with  CD-ROM.   It  is  implemented  in  the  hardware,  and 
uses  two  independent  R-S  codes  to  achieve  an  error  rate  of  1 
uncorrected  error  per  109  bytes. 

CLV--Constant  Linear  Velocity  (as  opposed  to  CAV) . 
Used  with  CD-ROM  to  keep  the  data  moving  past  the  optical 
head  at  a  constant  rate.   In  order  to  accomplish  this,  the 
rotational  speed  of  the  disc  must  vary,  decreasing  as  the 
head  moves  from  the  inner  tracks  toward  the  outer  perimeter. 
The  range  is  approximately  500  to  200  rpm  for  a  CD-ROM  disc 
drive . 

Code--A  method  of  representing  data  in  a  form  the  computer 
can  understand  and  use. 

Coinmand--A   code    that    represents    an    instruction    for   the 
computer. 

CRC--Cyclic  Redundancy  Code.   ECC  algorithm  for  the  checking 
of  CD-ROM  after  error  correction  is  performed--only  capable 
of  error  detection. 

Donsity--The  closeness  of  space  distribution  on  a  storage 
medium  such  as  a  disc. 

Digital--Digital  data  is  a  representation  of  information  by 
numerals.   Thus,  the  size  of  the  electrical  voltage  is 
expressed  as  numbers:  that  is,  in  digits. 
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Disc  Preparation--? roviding  certified  tapes  and  shipping 
containers  for  customer  data.   Scanning  input  tapes  for  data 
integrity  and  cleaning  up  minor  problems,  building  a 
directory  (High  Sierra  or  customer) ,  putting  the  data  in 
proper  format  for  the  mastering  center  user. 

DPI--Dots  per  inch  refer  to  the  dots,  or  spots,  of  ink 
placed  on  paper  by  a  printer;  each  may  be  composed  of  more 
than  one  pixel . 

DOS — See  Disk  Operating  System. 

Doxable-Density--This  term  is  most  often  applied  to  the 
storage  characteristics  of  disks,  and  generally  refers  to 
the  density  of  the  storage  of  bits  on  the  disk  surface  on 
each  track. 

DRAW — Direct  Read  After  Write.   A  write  once  optical  disc 
technology  (See  also  WORM) ,  an  error  control  technique; 
however,  it  is  unable  to  be  used  with  CD-ROM. 

EBCDIC--Extended  Binary  Coded  Decimal  Interchange  Code.  An 
8-bit  code  developed  by  IBM,  and  used  primarily  by  IBM  and 
its  compatibles.   The  code  is  used  to  represent  256  numbers, 
letters  and  characters  in  a  computer  system.  (See  also 
ASCII) 

ECC — Error  Correction  Coding.   The  application  or  addition 
of  data  to  the  original  data  in  order  to  provide  a  means  of 
correction  when  an  error  in  the  original  data  is  detected. 

EDAC--Error  Detection  and  Correction.   Redundant  information 
which  is  calculated  according  to  certain  algorithms  used  to 
detect  and  correct  errors  when  data  is  read. 

EDC — Error  Detection  Code.   The  application  of  redundant 
data  to  the  original  data  in  order  to  detect  errors. 

GB — See  Gigabyte. 

Gbyto--See  Gigabyte. 

Giga — 1, 000,000,000. 
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Gigabyte--!, 000  megabytes,  or  1  billion  (109)  bytes. 

Glass  Master--The  original  glass  disc  upon  which  the  digital 
information  is  burned  with  a  laser.   From  it  are  formed  the 
"stampers"  which  in  turn  are  used  to  produce  the  numerous 
discs,  usually  by  an  injection  molding  process. 

Hardware--The  physical  computer  and  all  of  its  component 
parts,  as  well  as  any  peripherals  and  interconnecting 
cables . 

HoadCrash--When  the  read-head  contacts  the  magnetic  surface 
of  the  disk — a  highly  undesirable  occurrence. 

High  Sierra  Group — An  ad  hoc  working  group  of  CD-ROM  service 
companies,  vendors,  and  manufacturers  which  has  been  a  prime 
source  of  activity  in  the  setting  of  standards  for  CD-ROM 
data  format  and  compatibility.   The  group  was  named  after 
its  first  meeting  place — the  High  Sierra  Hotel  at  Lake 
Tahoe .  The  group  first  met  in  1985. 

IC — Integrated  Circuit. 

Indexing--The  actual  processing  of  all  records  according  to 
the  layout  and  the  building  of  the  index  file.   Indexes 
permit  the  computer  to  rapidly  locate  data  without  searching 
through  the  full  body  of  data.   Generally,  a  data  item  is 
searchable  only  if  it  is  indexed. 

Indexing  Set  Up — Tape  handling,  resource  allocation  and  - 
loading  the  layout  programs  on  the  indexing  system. 

Instruction- -A  program  step  that  tells  the  computer  what  to 
do  for  a  single  operation  in  a  program. 

Interface--A  device  that  serves  as  a  common  boundary  between 
two  other  devices,  such  as  two  computer  systems  or  a 
computer  and  peripheral . 

Jewel  Box--The  plastic  container  in  which  the  CD-ROM  disc  is 
generally  stored. 
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Jukebox — See  Optical  Jukebox. 

K--Abbreviation  for  Kilo. 

KB — See  Kilobyte. 

Kbyte — See  Kilobyte. 

Kilo--A  prefix  meaning  (1)  1000  when  used  in  a  mathematical 
expression;  or  (2)  1,024  210  when  used  as  a  unit  measure  in 
computers.   As  an  example,  16K  would  equal  16  times  1,024  or 
16,384. 

Kilobyte--A  unit  of  measure  in  computers  that  equals  1024 
bytes . 

I*AN--Local  Area  Network. 

Lemd--The  reflective  area  between  two  adjacent  non= 
reflective  pits  on  a  disc.   The  transition  from  pit  to  land 
or  land  to  pit  represents  a  binary  1.  (See  also  Run) . 

M--Abbreviation  for  Mega. 

Magneto-Optic- -A  form  of  erasable  media  that  stores 
information  in  the  form  of  vertically  oriented  magnetic 
domains . 

Mastering--The  entire  process  involving  the  scheduling  of 
the  mastering  center,  managing  artwork  and  packaging  issues 
and  Q.A.ing  all  replicas  for  data  integrity  and  readability. 

MB--See  Megabyte. 

Mbyte--See  Megabyte. 

Mega — 1, 000, 000. 

Megabyte — 1,000  Kilobytes,  or  1  million  (106)  bytes. 

Metal  Mother--The  negative  mold  created  from  the  glass 
master  which  is  in  turn  used  to  stamp  the  numerous  discs. 
Often  called  a  "stamper". 
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Micron--One  square  micron,  the  area  occupied  by  1  bit  on  a 
CD-ROM.   One  millionth  of  a  meter. 

Microsecond — One  1/1 , 000, 000th  of  a  second. 

Millisecond — One  1/1, 000th  of  a  second. 

MO--See  Magneto-Optic. 

MS-DOS — The  disk  operating  system  used  with  IBM  computers 
and  their  compatibles. 

OCR--Optical  Character  Recognition.   Generally  used  in 
reference  to  a  device  capable  of  scanning  printed  material 
into  a  digital  form. 

ODS--Optical  Digital  Data  Disc 

Optical  Julcebox--A  store  and  read  mechanism  capable  of 
storing  and  accessing  multiple  CD-ROMs.   Accessing  is 
generally  accomplished  by  mechanical  means  afterwhich  the 
discs  are  placed  on  a  single  reader  (disc  drive)  for  use. 

OROM — Optical  Read  Only  Memory 

Photoconverter--See  CCD . 

Photocell--A  photocell  is  an  electronic  component  which 
changes  a  light  signal  into  an  electrical  signal  by 
photoelectric  conversion.   A  photocell  is  only  a  few  microns 
square . 

Pit--The  microscopic  depression  in  the  reflective  surface  of 
a  disc.   The  pattern  of  pits  represents  the  data  being 
stored  on  the  disc.  (See  also  '"'land'')-   The  light  from  the 
laser  used  to  read  the  data  is  reflected  back  from  the 
lands,  but  scattered  by  the  pits.   A  typical  pit  as  about 
the  size  of  a  bacterium  -  0.5  by  2.0  microns. 

Pixel--A  pixel  (picture  element)  is  the  smallest 
controllable  element  of  an  image.   As  resolution  (the  number 
of  pixels  per  inch)  increases,  pixel  size  decreases  and 
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details  are  more  accurately  represented.   Pixels  are  usually 
square,  but  they  may  be  rectanqular  or  round.   The  shape  is 
determined  by  the  optical  system  of  the  device. 

PPI — Pixels  per  inch. 

Platter — Generally  used  in  reference  to  the  larger  (12'') 
optical  discs.   Sometimes  in  reference  to  a  single  layer  in 
a  magnetic  disc  pack. 

RAM--Random  Access  Memory.  Semiconductor  memory  circuits 
used  to  store  data  and  programs  in  information  processing 
systems . 

Resolution--Resolution  is  defined  as  the  number  of  pixels 
read  or  displayed  per  inch  (PPI),  both  horizontally  and 
vertically . 

R/W/E-- /Road/Write/Erase- -An  alternative  title  for  erasable 
discs . 

Run — The  distance  between  transitions  either  from  land  to 
pit  or  pit  to  land.   The  distance  represents  two  or  more 
zeros  (See  also  Land) . 

SCSI--Small  Computer  Systems  Interface--A  complete  8bit 
parallel  interface  bus  structure  with  rates  up  to  4 
Mbytes/sec.  that  is  subordinate  to  the  rest  of-the  system 
architecture.   Up  to  8  systems  and  peripherals  may  be 
connected  to  the  same  bus. 

SoftKaro--A  general  term  that  applies  to  any  program  (set  of 
instructions)  that  can  be  loaded  into  a  computer  from  any 
source . 

SPI — Spots  per  inch.  See  DPI. 

Stan^er — See  Metal  Mother 

Substrate — The  base  material  form  which  a  disc  is  made, 
generally  a  strong  and  transparent  polycarbonate  plastic. 
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Tbyte — Terabyte  or  1,000  gigabytes.  (1012) 

Traclc--A  linear,  spiral  or  circular  path  on  which 
information  is  placed,  or  found.   The  portion  of  a  disk  that 
one  read/write  head  passes  over  to  extract  data.   Track 
density  is  measured  in  tpi  (tracks  per  inch) . 

WORM — Write  Once  Read  Many  (occasionally  seen  as  Write  Once 
Read  "Mostly"  or  "Multiple") . 
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